|
RELIEF is a feature selection algorithm used in binary classification (generalisable to polynomial classification by decomposition into a number of binary problems) proposed by Kira and Rendell in 1992.〔Kira, Kenji and Rendell, Larry (1992). (The Feature Selection Problem: Traditional Methods and a New Algorithm ). AAAI-92 Proceedings.〕 Its strengths are that it is not dependent on heuristics, runs in low-order polynomial time, and is noise-tolerant and robust to feature interactions, as well as being applicable for binary or continuous data; however, it does not discriminate between redundant features, and low numbers of training instances fool the algorithm. Kononenko et al. proposed some updates to the algorithm (RELIEFF) in order to improve the reliability of the probability approximation, make it robust to incomplete data, and generalising it to multi-class problems.〔Kononenko, Igor et al. Overcoming the myopia of inductive learning algorithms with RELIEFF (1997), Applied Intelligence, 7(1), p39-55〕 ==RELIEF== Take a data set with ''n'' instances of ''p'' features, belonging to two known classes. Within the data set, each feature should be scaled to the interval (1 ) (binary data should remain as 0 and 1). The algorithm will be repeated ''m'' times. Start with a ''p''-long weight vector (W) of zeros. At each iteration, take the feature vector (X) belonging to one random instance, and the feature vectors of the instance closest to X (by Euclidean distance) from each class. The closest same-class instance is called 'near-hit', and the closest different-class instance is called 'near-miss'. Update the weight vector such that : Thus the weight of any given feature decreases if it differs from that feature in nearby instances of the same class more than nearby instances of the other class, and increases in the reverse case. After ''m'' iterations, divide each element of the weight vector by ''m''. This becomes the relevance vector. Features are selected if their relevance is greater than a threshold ''τ''. Kira and Rendell's experiments〔Kira, Kenji and Rendell, Larry (1992) (A Practical Approach to Feature Selection ), Proceedings of the Ninth International Workshop on Machine Learning, p249-256〕 showed a clear contrast between relevant and irrelevant features, allowing ''τ'' to be determined by inspection. However, it can also be determined by Cebysev's inequality for a given confidence level (''α'') that a ''τ'' of 1/sqrt(α *m) is good enough to make the probability of a Type I error less than ''α'', although it is stated that ''τ'' can be much smaller than that. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Relief (feature selection)」の詳細全文を読む スポンサード リンク
|